AITopics | deductive output

Collaborating Authors

deductive output

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference

Gokden, Burc

arXiv.org Artificial IntelligenceFeb-22-2025

February 18, 2025 A BSTRACT We show that Large Language Model from Power Law Decoder Representations (PLDR-LLM) is a foundational model whose deductive outputs are invariant tensors up to a small perturbation. PLDR-LLM learns a singularity condition for the deductive outputs that enable the once-inferred energy-curvature tensor G LMto replace the deep neural network of power law graph attention (PLGA) generating the deductive outputs at inference. We demonstrate that a cache for G LM(G-cache) and KV -cache can be implemented in a straightforward manner to improve the inference time. The invariance and generalizable nature of deductive outputs is at a very high fidelity where deductive outputs have same RMSE and determinant values up to 15 decimal places after caching, and zero-shot benchmark scores remain unchanged. Ablation studies show that learned deductive outputs have distinct loss and accuracy characteristics from models pretrained with transferred, randomly initialized or identity tensors as a constant tensor operator and an LLM with scaled-dot product attention (SDP A) is a special case of PLDR-LLM where G LMis predefined as identity. The observed invariance characteristic introduces a novel asymmetry between training and inference phases with caching. We outline observed common characteristics of the deductive outputs for the learned singularity condition. We provide an implementation of a training and inference framework for PLDR-LLM with KV -cache and G-cache. 1 Introduction Large Language Model from Power Law Decoder Representations (PLDR-LLM) is a novel language model architecture with well-defined deductive and inductive outputs [Gokden, 2024]. It is composed of deep layers of decoders with multi-headed Power Law Graph Attention (PLGA) [Gokden, 2021, 2019]. The deductive outputs are intended to observe and regularize the model, while the inductive output is the next-token prediction of a language model. PLGA is a series of non-linear and linear transformations that attend to an input sentence that can be considered as a weighted graph G = ( V, E) where nodes are the tokens densely represented by an N-dimensional embedding space. The PLGA learns a metric tensor A LMof the embedding space after applying a custom fully connected layer and iSwiGLU, a positive semi-definite activation function, to the output A of a deep residual network of gated linear units (GLUs) whose input is a density matrix operator derived from the query.

deductive output, generalizable tensor operator, pldr-llm, (12 more...)

arXiv.org Artificial Intelligence

2502.13502

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (0.68)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PLDR-LLM: Large Language Model from Power Law Decoder Representations

Gokden, Burc

arXiv.org Artificial IntelligenceOct-22-2024

We present the Large Language Model from Power Law Decoder Representations (PLDR-LLM), a language model that leverages non-linear and linear transformations through Power Law Graph Attention mechanism to generate well-defined deductive and inductive outputs. We pretrain the PLDR-LLMs of varying layer sizes with a small batch size of 32 and $\sim$8B tokens from the RefinedWeb dataset, and show that they achieve competitive performance in zero-shot and few-shot settings compared to scaled dot-product LLMs of similar model size reported in the literature. We show that deductive outputs of PLDR-LLMs can be used to compare model characteristics or improve the performance by introducing the Directed Acyclic Graph (DAG) loss as a metric and regularizer. Our results indicate that the initial maximum learning rate and warm-up steps have a lasting impact on deductive outputs throughout the pretraining. We provide a detailed description of PLDR-LLM architecture, its implementation and the pretraining procedure.

large language model, machine learning, pldr-llm, (18 more...)

arXiv.org Artificial Intelligence

2410.16703

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(5 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback